[hail] Better scaling on RVD.union #6943

tpoterba · 2019-08-26T19:34:01Z

Do a tree reduce instead of a linear reduce. This means that the java
stack depth is log2(N) instead of N, and prevents stack overflow errors
when unioning hundreds of tables together.

Do a tree reduce instead of a linear reduce. This means that the java stack depth is log2(N) instead of N, and prevents stack overflow errors when unioning hundreds of tables together.

patrick-schultz · 2019-08-26T21:20:43Z

I'm confused by the stack depth problem. reduce isn't recursive, it forwards to reduceLeft:

  def reduceLeft[B >: A](op: (B, A) => B): B = {
    if (isEmpty)
      throw new UnsupportedOperationException("empty.reduceLeft")

    var first = true
    var acc: B = 0.asInstanceOf[B]

    for (x <- self) {
      if (first) {
        acc = x
        first = false
      }
      else acc = op(acc, x)
    }
    acc
  }

tpoterba · 2019-08-26T21:53:01Z

The problem is that in the ordered merge usage, the spark DAG builds up a stack of 200 RDDs / iterators.

patrick-schultz · 2019-08-26T22:12:41Z

Ah, right, that stack.

[hail] Better scaling on RVD.union

12772ac

Do a tree reduce instead of a linear reduce. This means that the java stack depth is log2(N) instead of N, and prevents stack overflow errors when unioning hundreds of tables together.

tpoterba assigned patrick-schultz Aug 26, 2019

patrick-schultz approved these changes Aug 26, 2019

View reviewed changes

danking merged commit 990e875 into hail-is:master Aug 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hail] Better scaling on RVD.union #6943

[hail] Better scaling on RVD.union #6943

tpoterba commented Aug 26, 2019

patrick-schultz commented Aug 26, 2019

tpoterba commented Aug 26, 2019

patrick-schultz commented Aug 26, 2019

[hail] Better scaling on RVD.union #6943

[hail] Better scaling on RVD.union #6943

Conversation

tpoterba commented Aug 26, 2019

patrick-schultz commented Aug 26, 2019

tpoterba commented Aug 26, 2019

patrick-schultz commented Aug 26, 2019